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bJO, Abstract 

C/3 ' Frequency-magnitude distributions, and their associated uncertainties, 

y , are of key importance in statistical seismology. When fitting these dis- 

C/5 ■ tributions, the assumption of Gaussian residuals is invalid since event 

^^' numbers are both discrete and of unequal variance. In general, the ob- 

1-5 . served number in any given magnitude range is described by a binomial 

distribution which, given a large total number of events of all magnitudes, 
approximates to a Poisson distribution for a sufficiently small probability 

^SJ ■ associated with that range. In this paper, we examine four earthquake 

^ ' catalogues: New Zealand (Institute of Geological and Nuclear Sciences), 

^^ , Southern California (Southern California Earthquake Center), the Pre- 

0^ ■ liminary Determination of Epicentres and the Harvard Centroid Moment 

C^ ' Tensor (both held by the United States Geological Survey). Using inde- 

pendent Poisson distributions to model the observations, we demonstrate 

[~^ ' a simple way of estimating the uncertainty on the total number of events 

c7^ ' occurring in a fixed time period. 

oo ■ 

o 



1 Introduction 

It is well documented that typical catalogues containing large numbers of earth- 
quake magnitudes are closely approximated by power-law or gamma frequency 
distributions [ij, l^j S Ijl • This paper addresses the characterisation of count- 
ing errors (that is, the uncertainties in histogram frequencies) required when 
fitting such a distribution via the maximum likelihood method, rather than 
the choice of model itself (for which see [5|). We follow this with an empiri- 
cal demonstration of the Poisson approximation for total event-rate uncertainty 
[used in 5] . Our analysis provides evidence to support the assumption in seismic 
hazard assessment that earthquakes are Poisson processes [61, [Zi, |8|, l9| , which is 
routinely stated yet seldom tested or used as a constraint when fitting frequency- 
magnitude distributions. Use is made of the Statistical Seismology Library [10|, 



specifically the data downloaded from the New Zealand Institute of Geological 
and Nuclear Sciences (GNS , |http:/ /www.gns.cri.nz), the Southern Cahfornia 
Earthquake Center (S CEC, |http://www. sce c.org) and the United States Geo- 
logical Survey (USGS, http://www.usgs.gov), along with associated R functions 
for extracting the data. 

Consider a large sample of N earthquakes. In order to estimate the underly- 
ing proportions of different magnitudes, which reflect physical properties of the 
system, the data are binned into m magnitude ranges containing n events such 
that X)i=i ""* ^ ^- Since n are discrete, a Gaussian model for each rii is inappro- 
priate and may introduce significant biases in parameter estimations [111 . Il2l . [l3| . 
Hence when fitting some relationship with magnitudes M, nfn = /(M), lin- 
ear regression must take the generalised, rather than least-squares, form |14| . 
Weighted least squares is an alternative approach which we do not consider 
here. The set n is described as a multinomial distribution; should we wish to 
test whether two different samples n and n' are significantly different given a 
fixed N "trials" , confidence intervals that reflect the simultaneous occurrence of 
all n must be constructed using a Bayesian approach [15|. However, in the case 
of earthquake catalogues, it is the temporal duration rather than the number 
of events that is fixed. Observational variability is not, therefore, constrained 
to balance a higher n^ at some magnitude with a lower rij elsewhere, and n are 
well approximated by independent binomial distributions [l6|. 

Each incremental magnitude range {Mi — SM/2,Mi + SM/2) contains a pro- 
portion of the total number of events and hence a probability pi with which any 
event will fall in that range. Providing the overall duration of the catalogue 
is greater than that of any significant correlations between either magnitudes 
or inter-event times, n-i can be modelled as a binomial experiment with TV in- 
dependent trials each having a probability of "success" pi |la|. The binomial 
distribution converges towards the Poisson distribution as iV ^ oo while Npi 
remains fixed. Various rules of thumb are quoted to suggest values of N and pi 



for which a Poisson approximation may be valid; see for example [17|, [18| . Here, 
we show empirically in Sect. [2] that the frequencies in four natural earthquake 
catalogues are consistent with a Poisson hypothesis, while in Sect. [3] we derive 
the resulting Poisson distributions of the total numbers of events, which provide 
simple measures of uncertainty in event rates. 

2 Frequency-magnitude Distributions 

Four earthquake catalogues are analysed: New Zealand (1460 - Mar 2007), 
Southern California (Jan 1932 - May 2007), the Preliminary Determination of 
Epicentres (PDE, Jan 1964 - Sep 2006) and the Harvard Centroid Moment 
Tensor (CMT, Jan 1977 - June 1999, <100 km focal depth). While we impose 
no additional temporal or spatial filters on the raw data, magnitude limits are 
chosen to minimise the effects of incompleteness at lower magnitudes and un- 
dersampling of higher magnitudes. Following [5|, who demonstrate the use of 
an objective Bayesian information criterion for choosing between functions, we 



seek to fit each catalogue with cither a single power-law distribution 

logio n = a - 6M, (1) 

M being already on a log scale, or a gamma distribution 

logion = a-6M-cexp(fcM), (2) 

where a, b, c and k are constants. The gamma distribution consists of a power 
law of seismic moment or energy at the lower magnitudes followed by an ex- 
ponential roll-off. Unlike pure power laws, its integration is finite and so it 
represents a physical generalisation of the Gutenberg-Richter law; for examples 
see Il9j and references therein. For internal consistency, the Poisson assumption 
in y is indeed valid as we now demonstrate. 

As explained in Sect. [U generalised linear regression is required since we 
have non-Gaussian counting errors on each bin. To test the consistency of 
these counting errors with the Gaussian, binomial and Poisson distributions, 
the residuals (observations minus chosen fit) arc normalised to their 95% confi- 
dence intervals and plotted in Fig. [T] In all four catalogues, the binomial and 
Poisson residuals are almost indistinguishable, and show no significant deviation 
from the expected 1 in 20 exceedance rate when counting those points that lie 
outside the 95% confidence limits. Equal bin widths AM = 0.1 are used as is 
common practice in earthquake hazard analysis; while this underestimates the 
intrinsic physical uncertainty of earthquake magnitude determination, for the 
present purposes the Poisson model appears to be a good proxy. At least, for the 
catalogues considered here and with AA/ = 0.1, the Poisson model is valid. By 
way of a further check, the value 6 of the fitted power-law slope (Equs.[l][2]) given 
binomial errors is, to two significant figures, equal to that given Poisson errors, 
for all four catalogues. Constant Gaussian errors systematically overestimate 
frequency uncertainties on the smaller magnitudes, leading to differences in b of 
-1-10% and —30% respectively for the Southern California and PDF data (see 
caption of Fig. [2]). These are caused by over- weighting the exponential compo- 
nents of the gamma distributions and exemplify worst-case results of incorrect 
error structures. In Fig. [21 then, we need only plot the fits and uncertainties 
using the Poisson model. Let us now describe, in Sect. [3l the usefulness of this 
result for estimating event-rate uncertainties. 

3 Event-rate Uncertainties 

Having established that independent Poisson distributions characterise the mag- 
nitude frequencies in these four catalogues (importantly, these data span suffi- 
ciently large times and distances as to minimise dependencies due to clustering), 
we now ask how this impacts on uncertainties in total numbers of events. While 
we cannot create equivalent catalogues by re-sampling the same regions under 
the same physical conditions, we can simulate S — 10^ samples from each magni- 
tude range by keeping the fitted mean A^ constant (representing the underlying 
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Figure 1: Residuals of fitted frequency-magnitude distributions from 
GNS/SCEC/USGS catalogues: (a) New Zealand, (b) Southern California, (c) 
PDE, (d) CMT. (solid line) Best fit to Eq.[T]or[l (dashed lines) 95% confidence 
limits of respective distribution. 
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Figure 2: Frequency-magnitude distributions from GNS/SCEC/USGS cata- 
logues, (solid line) Best fit to Eq. [T]or[2] (a) New Zealand, power law b — f.O; 
(b) Southern California, gamma b = 0.91; (c) PDE, gamma b = 0.91; (d) CMT, 
gamma b = 0.85. (dashed lines) 95% Poisson confidence limits. Unweighted 
Gaussian regression leads to 6-value estimates of (a) 0.98, (b) 1.03, (c) 0.66, (d) 
0.83. 



reality) and using the Poisson estimate af — Xi to capture the observational 
variance. Summing these realisations, one per bin over all magnitudes, provides 
a large set of plausible alternative totals. Figure [3] shows histograms of these 
simulated totals for each of the four catalogues, fitted with Poisson distributions 
for reasons we now explain. 

It is straightforward to show analytically that the sum of independent Pois- 
son variables is itself Poisson with a mean (and hence variance) equal to the 
sum of the component means A pJl . This result holds for (i) any number of 
independent Poisson variables (in the current context, bins) with (ii) any re- 
lationship A= /(M), since the result is independent of /(M). In the case of 
earthquakes placed into bins of width AM at magnitudes M, for example, /(M) 
is commonly fitted by a power-law or gamma distribution as in Fig. [2] From 
the Poisson property a^ = A, it follows that 



'N 



\^ = J2x = J2f{M). (3) 



Thus we have a useful result: if there exists a physically justifiable function 
that provides a satisfactory fit to the histogram (that is, Poisson-distributed un- 
correlated residuals as in Fig.[T]) then the mean and variance of the total number 
of events, over different realisations of the catalogue, are both equal to the sum 
of the fitted values (Eq. [3]). For the simulations of our four example catalogues 
(Fig. [3]), we have mean total event numbers of Xn = 19231, 17491,46454,9301 
respectively; these match the actual observed totals to an accuracy of ±1. Em- 
pircal evaluations confirm ctjv = VX/v to two significant figures, hence our es- 
timated uncertainties on total event numbers for these catalogues are crjv = 
140, 130, 220, 96. Since (i) a Poisson distribution converges towards a Gaussian 
as A — > cx), (ii) a reasonable approximation to this exists where A > 5 and 
S — X > 5 for sample size S [205, and (iii) we have S = 10^ with Aat given above, 
it is not surprising that the Poisson confidence intervals for Ajv ± cfn are (to two 
significant figures) 68% as in the Gaussian case. 

4 Conclusions 

The purpose of this paper is to draw attention to the simplicity with which one 
can formally estimate event-rate uncertainties for applications in seismic hazard 
analysis, both in small magnitude ranges and over whole catalogues. For each of 
the four earthquake catalogues considered here, we find that the best estimate 
of both the mean and the variance of the total number of events, is equal to 
the total calculated from the fit to the histogram. This approximation holds 
where (i) the residuals of the fit are independently Poisson distributed, and 
(ii) the overall duration of the catalogue is greater than that of any significant 
correlations between either magnitudes or inter-event times. Note that the ratio 
of binomial-to-Poisson variance for any frequency n is cr'^/up = 1 — p„ < 1, 
which implies that the Poisson approximation provides an upper bound for 
the uncertainty on the total event rate should any residuals generalise to the 
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Figure 3: Event-rate distributions from 10^ simulated realisations of 
GNS/SCEC/USGS catalogues. Each total event-rate is the sum of a random 
sample of frequencies, one per bin, given Poisson uncertainties shown in Fig. [51 
(a) New Zealand, (b) Southern California, (c) PDF, (d) CMT. (solid line) Best 
fit Poisson distribution; (dashed lines) 99% binomial confidence limits. 



binomial case. However, correlations between inter-event times could cause 
significant future changes in event rates, greater than predicted by the naive 
estimates of uncertainty presented here, and this is the subject of further study. 
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